---
title: Portable Prediction Server running modes
description: Learn how to configure the Portable Prediction Server for single-model or multi-model running mode.

---

# Portable Prediction Server running modes {: #portable-prediction-server-running-modes }

There are two model modes supported by the server: single-model (SM) and multi-model (MM). Use SM mode when only a single model package has been mounted into the Docker container inside the `/opt/ml/model` directory. Use MM mode in all other cases. Despite being compatible predictions-wise, SM mode provides a simplified HTTP API that does not require a model package to be identified on disk and preloads a model into memory on start.

The Docker container Filesystem directory should match the following layouts.

For SM mode:

```
/opt/ml/model/
└── model_5fae9a023ba73530157ebdae.mlpkg
```

For MM mode:

```
/opt/ml/model/
├── fraud
|   └── model_5fae9a023ba73530157ebdae.mlpkg
└── revenue
    ├── config.yml
    └── revenue-estimate.mlpkg
```

### HTTP API (single-model) {: #http-api-single-model }

When running in single-model mode, the Docker image exposes three HTTP endpoints:

* `POST /predictions` scores a given dataset.
* `GET /info` returns information about the loaded model.
* `GET /ping` ensures the tech stack is up and running.

!!! note
     Prediction routes only support comma-delimited CSV and JSON records scoring datasets. The maximum payload size is 50 MB.

``` sh
curl -X POST http://<ip>:8080/predictions \
    -H "Content-Type: text/csv" \
    --data-binary @path/to/scoring.csv
{
  "data": [
    {
      "predictionValues": [
        {"value": 0.250833758, "label": "yes"},
        {"value": 0.749166242, "label": "no"},
      ],
      "predictionThreshold": 0.5,
      "prediction": 0.0,
      "rowId": 0
    }
  ]
}
```

If CSV is the preferred output, request it using the `Accept: text/csv` HTTP header.

``` sh
curl -X POST http://<ip>:8080/predictions \
    -H "Accept: text/csv" \
    -H "Content-Type: text/csv" \
    --data-binary @path/to/scoring.csv
<target>_yes_PREDICTION,<target>_no_PREDICTION,<target>_PREDICTION,THRESHOLD,POSITIVE_CLASS
0.250833758,0.749166242,0,0.5,yes
```

### HTTP API (multi-model) {: #http-api-multi-model }

In multi-model mode, the Docker image exposes the following endpoints:

* `POST /deployments/:id/predictions` scores a given dataset.
* `GET /deployments/:id/info` returns information about the loaded model.
* `POST /deployments/:id` uploads a model package to the container.
* `DELETE /deployments/:id` deletes a model package from the container.
* `GET /deployments` returns a list of model packages that are in the container.
* `GET /ping` ensures the tech stack is up and running.

The `:id` included in the `/deployments` routes above refers to the unique identifier for model packages on the disk. The ID is the directory name containing the model package. Therefore, if you have the following `/opt/ml/model` layout:

```
/opt/ml/model/
├── fraud
|   └── model_5fae9a023ba73530157ebdae.mlpkg
└── revenue
    ├── config.yml
    └── revenue-estimate.mlpkg
```

You may use `fraud` and `revenue` instead of `:id` in the `/deployments` set of routes.

!!! note
     Prediction routes only support comma delimited CSV and JSON records scoring datasets. The maximum payload size is 50 MB.

``` sh
curl -X POST http://<ip>:8080/deployments/revenue/predictions \
    -H "Content-Type: text/csv" \
    --data-binary @path/to/scoring.csv
{
  "data": [
    {
      "predictionValues": [
        {"value": 0.250833758, "label": "yes"},
        {"value": 0.749166242, "label": "no"},
      ],
      "predictionThreshold": 0.5,
      "prediction": 0.0,
      "rowId": 0
    }
  ]
}
```

## Monitoring {: #monitoring }

!!! note
      Before proceeding, be sure to configure monitoring for the PPS container. See the [Environment Variables](#environment-variables) and [Examples](#examples) sections for details. To use the [monitoring agent](mlops-agent/index), you need to configure the [agent spoolers](spooler) as well.

You can monitor prediction statistics such as [data drift](data-drift) and [accuracy](accuracy-settings) by [creating an external deployment](deploy-external-model) in the deployment inventory.

In order to connect your model package to a certain deployment, provide the deployment ID of the deployment you want to host your prediction statistics.

If you're in Single Model (SM) mode, the deployment ID has to be provided via the `MLOPS_DEPLOYMENT_ID` environment variable. In Multi Model (MM) mode, a special `config.yml` should be prepared and dropped alongside the model package with the desired `deployment_id` value:

```yaml
deployment_id: 5fc92906ad764dde6c3264fa
```

If you want to track accuracy, [configure it](accuracy-settings) for the deployment, and then provide extra settings for the running model:

For SM mode, set the following environment variables:

* `MLOPS_ASSOCIATION_ID_COLUMN=transaction_country` (required)
* `MLOPS_ASSOCIATION_ID_ALLOW_MISSING_VALUES=false` (optional, default=`false`)


For MM mode, set the following properties in `config.yml`:

```yaml
association_id_settings:
  column_name: transaction_country
  allow_missing_values: false
```

## HTTPS support {: #https-support }

!!! info "Availability information"
    If you are running PPS images that were downloaded previously, these parameters will not be available until the PPS image is manually updated:

    * Managed AI Platform (SaaS): starting Aug 2021
    * Self-Managed AI Platform: starting v7.2

By default, PPS serves predictions over an *insecure* listener on port `8080` (clear text HTTP over TCP).
You can also serve predictions over a *secure* listener port `8443` (HTTP over TLS/SSL, or simply HTTPS). When the secure listener is enabled, the insecure listener becomes unavailable.

!!! note
    You cannot configure PPS to be available on both ports simultaneously; it is either HTTP on `8080` or HTTPS on `8443`.

The configuration is accomplished using the environment variables described below:

* `PREDICTION_API_TLS_ENABLED`: The master flag that enables HTTPS listener on port `8443` and disables HTTP listener on port `8080`.
	* **Default**: false (HTTPS disabled)
	* **Valid values** (case-insensitive):

        | Parameter value |  Interpretation |
        |-----------------|-------------|
        | true, yes, y, 1  | true |
        | false, no, n, 0  | false |

    !!! note
        The flag value must be interpreted as `true` to enable TLS. All other `PREDICTION_API_TLS_*` environment variables (if passed) are ignored if this setting is not enabled.

* `PREDICTION_API_TLS_CERTIFICATE`: PEM-formatted content of the TLS/SSL certificate.
	 * **Required**: Yes if `PREDICTION_API_TLS_ENABLED` is `true`, otherwise no.
	 * **See also**: [NGINX SSL certificate documentation](http://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_certificate){ target=_blank }

* `PREDICTION_API_TLS_CERTIFICATE_KEY`: PEM-formatted content of the *secret* certificate key of the TLS/SSL certificate key.
	 * **Required**: Yes if `PREDICTION_API_TLS_ENABLED` is `true`, otherwise no.
	 * **See also**: [NGINX SSL certificate key documentation](http://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_certificate_key){ target=_blank }

* `PREDICTION_API_TLS_CERTIFICATE_KEY_PASSWORD`: Passphrase for the *secret* certificate key passed in `PREDICTION_API_TLS_CERTIFICATE_KEY`.
	 * **Required**: Yes, only if a certificate key was created with a passphrase.

* `PREDICTION_API_TLS_PROTOCOLS`: Encryption protocol implementation(s) to use.
    * **Default**: `TLSv1.2 TLSv1.3`
    * **Valid values**: `SSLv2`|`SSLv3`|`TLSv1`|`TLSv1.1`|`TLSv1.2`|`TLSv1.3`, or any space-separated combination of these values.

    !!! warning
        As of August 2021, all implementations except `TLSv1.2` and `TLSv1.3` are considered deprecated and/or insecure. DataRobot highly recommends using only these implementations. New installations may consider using `TLSv1.3` exclusively as it is the most recent and secure TLS version.

* `PREDICTION_API_TLS_CIPHERS`: List of cipher suites to use.
    * **Default**: [Mandatory TLSv1.3 ciphers](https://datatracker.ietf.org/doc/html/rfc8446#section-9.1){ target=_blank } and [recommended TLSv1.2 ciphers](https://datatracker.ietf.org/doc/html/rfc7525#section-4.2){ target=_blank }
	 * **Required**: No.
	 * **Valid values**: See [OpenSSL syntax](https://www.openssl.org/docs/man1.1.1/man1/ciphers.html){ target=_blank } for cipher suites.

    !!! warning
        TLS support is an advanced feature. The cipher suites list has been carefully selected to follow the latest recommendations and current best practices. DataRobot does not recommend overriding it.


## Environment variables {: #environment-variables }

| Variable | Description | Default |
|----------|------------------|---|
| `PREDICTION_API_WORKERS` | Sets the number of workers to spin up. This option controls the number of HTTP requests the Prediction API can process simultaneously. Typically, set this to the number of CPU cores available for the container. | `1` |
| `PREDICTION_API_MODEL_REPOSITORY_PATH` | Sets the path to the directory where DataRobot should look for model packages. If the `PREDICTION_API_MODEL_REPOSITORY_PATH` points to a directory containing a single model package in its root, the single-model running mode is assumed by PPS. Multi-model mode is assumed otherwise.  | `/opt/ml/model/` |
| `PREDICTION_API_PRELOAD_MODELS_ENABLED` | Requires every worker to proactively preload all mounted models on start. This should help to eliminate the problem of cache misses for the first requests after the server starts and the cache is still "cold." See also `PREDICTION_API_SCORING_MODEL_CACHE_MAXSIZE` to completely eliminate the cache misses.  | <ul><li>`false` for multi-model mode</li><li>`true` to single-model mode</li></ul> |
| `PREDICTION_API_SCORING_MODEL_CACHE_MAXSIZE` | The maximum number of scoring models to keep in each worker's RAM cache to avoid loading them on demand for each request. In practice, the default setting is low. If the server running PPS has enough RAM, you should set this to a value greater than the total number of premounted models to fully leverage caching and avoid cache misses. Note that each worker's cache is independent, so each model will be copied to each worker's cache. Also consider enabling `PREDICTION_API_PRELOAD_MODELS_ENABLED` for multi-model mode to avoid cache misses. | `4` |
| `PREDICTION_API_DEPLOYED_MODEL_RESOLVER_CACHE_TTL_SEC` | By default, the PPS will periodically attempt to read deployment information from an `mplkg` in case the package was re-uploaded via HTTP. If you are not planning to update the `mplkg` after the PPS starts, consider setting this to `0` to disable deployment info cache invalidation. This will help reduce latency for some requests. | `60` |
| `PREDICTION_API_MONITORING_ENABLED` |   Sets whether DataRobot offloads data monitoring. If true, the Prediction API will offload monitoring data to the [monitoring agent](mlops-agent/index). | `false` |
| `PREDICTION_API_MONITORING_SETTINGS`|   Controls how to offload monitoring data from the Prediction API to the [monitoring agent](mlops-agent/index). Specify a list of [spooler configuration settings](spooler) in key=value pairs separated by semicolons. <br><br>Example for a Filesystem spooler:<br>`PREDICTION_API_MONITORING_SETTINGS="spooler_type=filesystem;directory=/tmp;max_files=50;file_max_size=102400000"`<br><br>Example for an SQS spooler:<br> `PORTABLE_PREDICTION_API_MONITORING_SETTINGS="spooler_type=sqs;sqs_queue_url=<SQS_URL>"`<br><br>For single-model mode of PPS, the `MLOPS_DEPLOYMENT_ID` and `MLOPS_MODEL_ID` variables are required; they are not required for multi-model mode. | `None` |
| `MONITORING_AGENT` | Sets whether the monitoring agent runs alongside the Prediction API. To use the [monitoring agent](mlops-agent/index), you need to configure the [agent spoolers](spooler). | `false`|
| `MONITORING_AGENT_DATAROBOT_APP_URL`   |   Sets the URI to the DataRobot installation (e.g., https://app.datarobot.com).                                                                                                                                                                                                                              |  `None`          |
| `MONITORING_AGENT_DATAROBOT_APP_TOKEN` | Sets a user token to be used with the DataRobot API. | `None` |
| `PREDICTION_API_TLS_ENABLED` | Sets the TLS listener master flag. Must be activated for the TLS listener to work.| `false` |
| `PREDICTION_API_TLS_CERTIFICATE` | Adds inline content of the certificate, in PEM format.  | `None` |
| `PREDICTION_API_TLS_CERTIFICATE_KEY` | Adds inline content of the certificate key, in PEM format. | `None` |
| `PREDICTION_API_TLS_CERTIFICATE_KEY_PASSWORD` | Adds plaintext passphrase for the certificate key file. | `None` |
| `PREDICTION_API_TLS_PROTOCOLS` | Overrides the TLS/SSL protocols. | `TLSv1.2 TLSv1.3` |
| `PREDICTION_API_TLS_CIPHERS` | Overrides default cipher suites. | Mandatory TLSv1.3, recommended TLSv1.2 |
| `PREDICTION_API_RPC_DUAL_COMPUTE_ENABLED` <br> _(Self-Managed 8.x installations)_ | For self-managed 8.x installations, this setting requires that the PPS run Python 2 *and* Python 3 interpreters. Then, the PPS automatically determines the version requirement based on which Python version the model was trained on. When this setting is enabled, `PYTHON3_SERVICES` is redundant and ignored. Note that this requires additional RAM to run both versions of the interpreter. | `False` |
| `PYTHON3_SERVICES` | Only enable this setting when the `PREDICTION_API_RPC_DUAL_COMPUTE_ENABLED` setting is disabled *and* each model was trained on Python 3. You can save approximately 400MB of RAM by excluding the Python2 interpreter service from the container. | `None` |

!!! important "Python support for self-managed installations"
    For Self-Managed installations before 9.0, the PPS _does not_ support Python 3 models by default; therefore, setting `PYTHON3_SERVICES` to `true` is required to use Python 3 models in those installations. 
    
    If you are running an 8.x version of DataRobot, you can enable "dual-compute mode" (`PREDICTION_API_RPC_DUAL_COMPUTE_ENABLED='true'`) to support both Python2 and Python 3 models; however, this configuration requires an extra 400MB of RAM. If you want to reduce the RAM footprint (and *all* models are either Python2 or Python3), you should avoid enabling "dual-compute mode." If all models are trained on Python 3, enable Python 3 services (`PYTHON3_SERVICES='true''`). If all models are trained on Python2, there is no need to configure an additional environment variable, as the default interpreter is still Python 2.

##  Request parameters {: #request-parameters }

###  Headers {: #headers }

The PPS does not support authorization; therefore, `Datarobot-key` and `Authorization` are not needed.

|  Key | Type           | Description  | Example(s) |
|------|----------------|--------------|------------|
| `Content-Type`     | string | Required. Defines the request format. | <ul><li> textplain; charset=UTF-8  </li><li> text/csv  </li><li> application/JSON  </li><li> multipart/form-data (For files with data, i.e., .csv, .txt files) |
| `Content-Encoding` | string | Optional. Currently supports only `gzip`-encoding with the default data extension. | `gzip` |
| `Accept` | string | Optional. Controls the shape of the response schema. Currently JSON (default) and CSV are supported. See examples. |  <ul><li>`application/json` (default)</li><li>`text/csv` (for CSV output)</li></ul> |


###  Query arguments {: #query-arguments }

The `predictions` routes (`POST /predictions` (single-model mode) and `POST /deployments/:id/predictions`) have the same query arguments and HTTP headers as their standard route counterparts, with a few exceptions. As with regular Dedicated Predictions API, the exact list of supported arguments depends on the deployed model. Below is the list of general query arguments supported by every deployment.


|  Key | Type           | Description  | Example(s) |
|------|----------------|--------------|------------|
| `passthroughColumns` | list of strings | Optional. Controls which columns from a scoring dataset to expose (or to copy over) in a prediction response. <br><br> The request may contain zero, one, or more columns. (There’s no limit on how many column names you can pass.) Column names must be passed as UTF-8 bytes and must be percent-encoded (see the [HTTP standard](https://tools.ietf.org/html/rfc2616){ target=_blank } for this requirement). Make sure to use the exact name of a column as a value. | `/v1.0/deployments/<deploymentId>/predictions?passthroughColumns=colA&passthroughColumns=colB` |
| `passthroughColumnsSet` | string| Optional. Controls which columns from a scoring dataset to expose (or to copy over) in a prediction response. The only possible option is `all` and, if passed, all columns from a scoring dataset are exposed. | `/v1.0/deployments/deploymentId/predictions?passthroughColumnsSet=all` |
| `decimalsNumber` | integer | Optional. Configures the precision of floats in prediction results. Sets the number of digits after the decimal point. <br><br> If there are no digits after the decimal point, rather than adding zeros, the float precision will be less than `decimalsNumber`. | `?decimalsNumber=15` |

Note the following:

* You can't pass the `passthroughColumns` and `passthroughColumnsSet` parameters in the same request.
* While there is no limit on the number of column names you can pass with the `passthroughColumns` query parameter, there is a limit on the size of the [HTTP request line](https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1"){ target=_blank } (currently 8192 bytes).

### Prediction Explanation parameters {: #prediction-explanation-parameters }

You can parametrize the [Prediction Explanations](dr-predapi#making-prediction-explanations) prediction request with the following query parameters:

!!! note
    To trigger prediction explanations `maxExplanations=N`, where N is greater than `0` must be sent.


|  Key | Type           | Description  | Example(s) |
|------|----------------|--------------|------------|
| `maxExplanations` | int OR string | Optional. Limits the number of explanations returned by server. Previously called `maxCodes` (deprecated). For SHAP explanations only a special constant `all` is also accepted. | <ul><li>`?maxExplanations=5`</li><li>`?maxExplanations=all`</li></ul> |
| `thresholdLow`  | float  | Optional. Prediction Explanation low threshold. Predictions must be below this value (or above the thresholdHigh value) for Prediction Explanations to compute.  | `?thresholdLow=0.678` |
| `thresholdHigh` | float  | Optional. Prediction Explanation high threshold. Predictions must be above this value (or below the thresholdLow value) for Prediction Explanations to compute. |  `?thresholdHigh=0.345` |
| `excludeAdjustedPredictions` | bool | Optional. Includes or excludes exposure-adjusted predictions in prediction responses if exposure was used during model building. The default value is `true` (exclude exposure-adjusted predictions). | `?excludeAdjustedPredictions=true` |
| `explanationNumTopClasses` | int | Optional. Multiclass models only; <br><br> Number of top predicted classes for each row that will be explained. Only for multiclass explanations. Defaults to 1. Mutually exclusive with `explanationClassNames`. | `?explanationNumTopClasses=5` |
| `explanationClassNames` | list of string types | Optional. Multiclass models only. A list of class names that will be explained for each row. Only for multiclass explanations. Class names must be passed as UTF-8 bytes and must be percent-encoded (see the [HTTP standard](https://tools.ietf.org/html/rfc2616){ target=_blank } for this requirement). This parameter is mutually exclusive with `explanationNumTopClasses`. By default, `explanationNumTopClasses=1` is assumed. | `?explanationClassNames=classA&explanationClassNames=classB` |


###  Time series parameters {: #time-series-parameters }

You can parametrize the time series prediction request using the following query parameters:

|  Key | Type           | Description  | Example(s) |
|------|----------------|--------------|------------|
| `forecastPoint`  | ISO-8601 string | An [ISO 8601](https://www.iso.org/iso-8601-date-and-time-format.html){ target=_blank } formatted DateTime string, without timezone, representing the [forecast point](glossary/index#forecast-point). This parameter cannot be used if `predictionsStartDate` and `predictionsEndDate` are passed. | `?predictionsStartDate=2013-12-20T01:30:00Z` |
| `relaxKnownInAdvanceFeaturesCheck` | bool | `true` or `false`. When `true`, missing values for known-in-advance features are allowed in the forecast window at prediction time. The default value is `false`. Note that the absence of known-in-advance values can negatively impact prediction quality. | `?relaxKnownInAdvanceFeaturesCheck=true` |
| `predictionsStartDate`  | ISO-8601 string | The time in the dataset when bulk predictions begin generating. This parameter must be defined together with `predictionsEndDate`. The `forecastPoint` parameter cannot be used if `predictionsStartDate` and `predictionsEndDate` are passed. | `?predictionsStartDate=2013-12-20T01:30:00Z&predictionsEndDate=2013-12-20T01:40:00Z` |
| `predictionsEndDate` | ISO-8601 string | The time in the dataset when bulk predictions stop generating. This parameter must be defined together with `predictionsStartDate`. The `forecastPoint` parameter cannot be used if `predictionsStartDate` and `predictionsEndDate` are passed. | See above. |

## External configuration {: #external-configuration }

You can also use the Docker image to read and set the configuration options listed in the table above (from `/opt/ml/config`). The file must contain `<key>=<value>` pairs, where each key name is a corresponding environment variable.

## Examples {: #examples }

1. Run with two workers:

       ``` sh
       docker run \
           -v /path/to/mlpkgdir:/opt/ml/model \
           -e PREDICTION_API_WORKERS=2 \
           -e PREDICTION_API_SCORING_MODEL_CACHE_MAXSIZE=32 \
           -e PREDICTION_API_PRELOAD_MODELS_ENABLED='true' \
           -e PREDICTION_API_DEPLOYED_MODEL_RESOLVER_CACHE_TTL_SEC=0 \
           datarobot/datarobot-portable-prediction-api:<version>
       ```

2. Run with external monitoring configured:

       ``` sh
       docker run \
           -v /path/to/mlpkgdir:/opt/ml/model \
           -e PREDICTION_API_MONITORING_ENABLED='true' \
           -e PREDICTION_API_MONITORING_SETTINGS='<settings>' \
           datarobot/datarobot-portable-prediction-api:<version>
       ```

3. Run with internal monitoring configured:

       ``` sh
       docker run \
           -v /path/to/mlpkgdir:/opt/ml/model \
           -e PREDICTION_API_MONITORING_ENABLED='true' \
           -e PREDICTION_API_MONITORING_SETTINGS='<settings>' \
           -e MONITORING_AGENT='true' \
           -e MONITORING_AGENT_DATAROBOT_APP_URL='https://app.datarobot.com/' \
           -e MONITORING_AGENT_DATAROBOT_APP_TOKEN='<token>' \
           datarobot/datarobot-portable-prediction-api:<version>
       ```

4. Run with HTTPS support using default protocols and ciphers:

       ``` sh
       docker run \
           -v /path/to/mlpkgdir:/opt/ml/model \
           -p 8443:8443 \
           -e PREDICTION_API_TLS_ENABLED='true' \
           -e PREDICTION_API_TLS_CERTIFICATE="$(cat /path/to/cert.pem)" \
           -e PREDICTION_API_TLS_CERTIFICATE_KEY="$(cat /path/to/key.pem)" \
           datarobot/datarobot-portable-prediction-api:<version>
       ```

5. Run with Python3 interpreter only to minimize RAM footprint:

       ``` sh
       docker run \
           -v /path/to/my_python3_model.mlpkg:/opt/ml/model \
           -e PREDICTION_API_RPC_DUAL_COMPUTE_ENABLED='false' \
           -e PYTHON3_SERVICES='true' \
           datarobot/datarobot-portable-prediction-api:<version>
       ```

6. Run with Python2 interpreter only to minimize RAM footprint:

       ``` sh
       docker run \
           -v /path/to/my_python2_model.mlpkg:/opt/ml/model \
           -e PREDICTION_API_RPC_DUAL_COMPUTE_ENABLED='false' \
           datarobot/datarobot-portable-prediction-api:<version>
       ```
